Skip to content

[Driver][SYCL] Remove object upon failure #18190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: sycl
Choose a base branch
from

Conversation

mdtoguchi
Copy link
Contributor

Typical behaviors when the compilation fails is to have the resulting output object to be removed. Due to the fact that there are multiple compilations that occur during an offload compilation, the actual final output object may not be directly associated with the compile causing it to not be removed.

Update the cleanup file list to remove the output result file regardless of the associated job action when we are performing offload.

@mdtoguchi mdtoguchi requested a review from a team as a code owner April 25, 2025 00:17
Typical behaviors when the compilation fails is to have the resulting
output object to be removed. Due to the fact that there are multiple
compilations that occur during an offload compilation, the actual final
output object may not be directly associated with the compile causing it
to not be removed.

Update the cleanup file list to remove the output result file regardless
of the associated job action when we are performing offload.

Signed-off-by: Michael D Toguchi <[email protected]>
Signed-off-by: Michael D Toguchi <[email protected]>
@@ -2708,7 +2708,11 @@ int Driver::ExecuteCompilation(
// Remove result files if we're not saving temps.
if (!isSaveTempsEnabled()) {
const JobAction *JA = cast<JobAction>(&FailingCommand->getSource());
C.CleanupFileMap(C.getResultFiles(), JA, true);
// When performing offload compilations, the result files may not match
// the JobAction that fails. In that case, do not pass in the JobAction
Copy link
Contributor

@srividya-sundaram srividya-sundaram Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// When performing offload compilations, the result files may not match the JobAction that fails.

Can you add some details on why they don't match?
And also give an example Offload JobAction for this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated result files are associated with a specific JobAction. One example here is with the old offloading model. Here, the output object from clang++ -fsycl file.cpp -o file.o is generated from the clang-offload-bundler call. Both the host and device compilations have temporary output files, so the JobAction associated with the actual fail (if one fails) including the output file do not match.

Here, the failing action is a BackendCompileJobAction but the file that needs to be removed is from the OffloadBundlingJobAction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the output object from clang++ -fsycl file.cpp -o file.o is generated from the clang-offload-bundler call.

clang++ -fsycl file.cpp -o file.o will trigger both compilation and linking at the same invocation and will not call clang-offload-bundler.

AFAIK, when compilation is separated from linking, the compilation step ends with engaging the offload bundler to generate "fat object" (<host object, device code IR> pair).

Both the host and device compilations have temporary output files, so the JobAction associated with the actual fail (if one fails) including the output file do not match

Can you clarify this?
If host or device compilation fails, there will be no host object or device object file generated right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - yeah -c is needed in that compilation line. Thanks for pointing that out.

Can you clarify this?
If host or device compilation fails, there will be no host object or device object file generated right?

Yes, that is right, no object will be generated. The issue is an existing object that needs to be cleaned up. During the object compilation, the compiler is responsible for cleaning up the output file upon failure. Since the output file in those instances are temporary - there is no association with the actual output file pointed to with -o. The driver knows what the final output object file is, so we need to take the steps to clean up upon failure.

Copy link
Contributor

@srividya-sundaram srividya-sundaram Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I understand it correctly that if the host and device object files have been created, but the clang-offload-bundler fails to produce the fat object, we should still proceed with deleting the host and device object files?
In this case, the failing JobAction is OffloadBundlingJobAction but the host and device object files were created by a different JobAction.

Is it not possible to pass the JobAction associated with generating the host and device object files instead of not passing any JA at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the failing JobAction is OffloadBundlingJobAction but the host and device object files were created by a different JobAction.

Is it not possible to pass the JobAction associated with generating the host and device object files instead of not passing any JA at all?

The JobAction associated with the host/device object generation is the the BackendCompileJobAction. The JobAction associated with the clang-offload-bundler call is the OffloadBundlingJobAction. Passing in the JobAction associated with the host and device compilation will not match the JobAction that is associated with the actual output file we want to remove, so ultimately the file will not be removed when we want it to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual output file we want to remove

What is the actual output file we want to remove in the case where OffloadBundlingJobAction fails?
My understanding is we want to remove the host and device object files, since those will be the only ones created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OffloadBundlingJobAction fails, we still want to remove the resulting output file, which is the 'fat' object. This is the named object file.o from clang++ -c -fsycl file.cpp -o file.o

// REQUIRES: system-linux

// RUN: touch %t.o
// RUN: not %clangxx -fsycl -Xsycl-target-frontend -DCOMPILE_HOST_FAIL=1 -o %t.o %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this test. Could you please elaborate?
I think you are creating a new object file.
And triggering a host compilation failure.
And the expectation is that the object file created gets removed?

The test description mentions, Verify object removal when the offload compilation fails , but you are triggering a host compilation failure.
Can you trigger any offload compilation action failure instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the macro name - it is a device compilation that I'm triggering the failure - the name carried over while I was experimenting with failure modes.

Yes, the expectation is for any pre-existing object file to be removed if the compilation that would potentially overwrite that file fails. This is standard behavior with compilations creating objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants